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Abstract 

An empirical law for the rank-order behavior of journal impact fac- 
tors is found. Using an extensive data base on impact factors includ- 
ing journals on Education, Agrosciences, Geosciences, Mathematics, 
Chemistry, Medicine, Engineering, Physics, Biosciences and Environ- 
mental, Computer and Material Sciences, we have found extremely 
good fittings outperforming other rank-order models. Based in our 
results we propose a two-exponent Lotkaian Informetrics. Some ex- 
tensions to other areas of knowledge are discussed. 

1 Introduction 

Quantitative studies in linguistics have a long lineage. Due to the extreme 
complexity of languages, these studies have been mainly based on statistical 
properties of words in literary corpora. Outstanding early examples of these 
studies are J. B. Estoup (1916), G. Dewey (1923) and E. V. Condon (1928). 
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However the most influential contribution on this topic is by G. K. Zipf 
(1949). In his work appears what is today known as Zipf's law which can 
be formulated as follows: Let /(r); r = 1, . . . , N, be the relative frequency 
of the words in a text in decreasing order. Then Zipf's law states that: 

/(0 = £ (!) 

In this case, the items are words taken from a given text, the most abun- 
dant word takes the first place (r = 1), the second one takes the following 
place (r = 2) and so on. The fact that the mathematical expression of the 
law is a negative exponent power law implies that the law is a straight line 
with negative slope a when plotted in log-log scales. K is a proportion- 
ality constant with no phenomenological interest. This empirical law has 
found applications in a wide range of natural and human phenomena (Li, 
2003). The case when a ~ 1 is of particular interest because it implies self- 
similarity . The exact mechanism behind Zipf's law still remains a mystery 
so far. However it is important to remark that the presence of power laws 
implies in general that the underlying mechanism is neither stochastic or 
regular. Power laws are the signature of correlated noise possibly associated 
to and "edge of chaos" dynamics (REF) or could be a clue to self-organized 
criticality (Bak et al, 1989) 

The main drawback of Zipf's law was the bad fitting at very high and 
very low frequencies in the word counting problem. An improvement over 
the Zipf's law was proposed by B. Mandelbrot (1954): 



f(r) 



N + p 1 



(2) 



r + p 

Where N is the number of different words in the text and p, e are param- 
eters to be adjusted. 

Zip's law is a special case of Mandelbrot's. This fact, along with a com- 
plete discussion of the role of power laws in the field of Informetrics can be 
found in (Egghe, 2005). 

Recently it has been reported (Le Quan; 2002) that what Zipf found is 
valid for small corpora (for the size of the text that were analyzable at that 
time), and that today that the computer allows the analysis of huge texts, 
the log-log plot shows a clear downwards bending tail instead of the predicted 
straight line. 
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Scientific productivity is another topic where the first studies date back 
almost a century with the works of A. Dresden (1922) and A. J. Lotka (1926). 
The law of Lotka has the same mathematical form of eq (1) but he already 
introduced bibliometric variables by using r as contributors or authors of 
a given paper and f(r) as articles or papers themselves. Since Lotka, it is 
common to call "sources" the independent variable and "item" the dependent 
one. This way, Lotka's law states that the number of items is a power law of 
the sources. The branch of Informetrics related to the study of power laws 
is called Lotkaian Informetrics (Egghe, 2005). 

Informetrics mainly deals with the relationships between sources and 
items. It is normal to find the pairs authors-journals or journals-bibliographies 
as sources and items. In this paper we explore the possibility of extending 
the Lotkaian Informetrics to the realm of Journal Impact Factors (JIFs). 
We show as well that the rank-order JIFs plots deviate from a traditional 
Lotkaian equation and propose an extension to what it could be called two- 
exponent Lotkaian laws. 

2 Impact Factors 

Impact Factor is a measure of the frequency with which the "average article" 
in a journal has been cited in a particular year or period (Garfield; 1994), it is 
calculated "by dividing the number of times a journal has been cited by the 
number of articles it has published during some specific period of time. The 
journal impact factor will thus reflect an average citation rate per published 
article" (Garfield; 1955). The impact factor of journals is an attempt to 
evaluate the knowledge production published among different journals of a 
given field. Mainly covered by the Science Citation Index database, it is 
published annually since 1975 in the Journal Citation Reports. 

JIFs has been the target of many criticisms (Soegler; 1997, Frohlich, 1996) 
and there is a debate about its usage as a tool to evaluate research. Even 
the influential journal Nature states that the JIFs figures should be handled 
carefully (Nature; 2005). Regardless its pros and cons, the fact is that it is 
an every day measure of the importance of a journal and it is worlwide used 
(de Marchi and Rocchi; 2001). 

While keeping a skeptical attitude towards the use of the JIFs to evaluate 
scientific research, it should be recognized that it is an outcome of the process 
of publication and it has became by itself a subject of scientific study. 
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Rank-order distribution of JIFs attracted the attention of D. Lavalette 
who (mentioned in Popescu (2003) proposed the following law: 



f(r) = K 



-N + 1 -r" 



(3) 



Where N is the number of journals, r is the ranking number, /(r) is the 
impact factor, b is a parameter to be fitted. 

In the next section we propose a law that outperforms Lavalette's (see 
Concluding Remarks). 



3 Analytical expression of the law 

Figure 1 shows the log-log plot of the IF of a randomly taken field from 
Popescu's database (2003) 



Figure 1 



It is evident that it is not a power law because of the bending tail in the 
right side of the plot. This fact motivated us to propose a Beta-like function: 

n^K^±2fil (4) 

/(r), r = 1 . . . ,N represents the rank-order impact factors; K, a and b 
are three parameters to fit. K is a meaningless scaling factor. Notice that 
when 6 = this equation becomes Lotka's law. 



4 Results 

For every set of data, we find the parameters values using a linear least 
squares method on the logarithmic variable: 

log(f(r)) = log(K) + blog(N + 1 - r) - alog(r) (5) 

Table 1 shows the values of K, b and a, as well as the coefficient of 
regression r 2 for impact factors of twelve disciplines. In Figs 2, 3 and 4 the 
impact factors data as well as our theoretical curve for the fields of Physical, 
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Mathematical and Environmental Sciences are shown. We used semilog plots 
because they are more natural when the abscissa is a rank-order variable. 

Table 1. 



Figures 2, 3 and 4. 

5 Concluding remarks 

We have shown the excellent agreement of the data with our model. The 
quality of the fitting is superior to the proposal of Lavalette. From the 
comparison of eqs 3 and 4, it is evident that Lavalette's law is a particular 
case of ours when a = b. Unfortunately, it is not possible to discuss the 
rationale behind Lavalette's law because the original paper is not available 
and all we know about it is a mention in Popescu's paper (2003). 

The underlying proposed mechanism yielding the above discussed behav- 
iors often assumes a kind of "biological evolution form". For instance, G. 
Yule (1924) working in a model suggested by J. Willis (1922) managed to 
prove that assuming a single ancestral specie and probabilities of mutation 
and duplication a power law behavior is obtained. Expansion-modification 
systems proposed by W. Li (1991), which take into account the basic features 
of DNA mutation processes (R. Mansilla and Cocho, 2000), are also able to 
predict this behaviour. 

When discussing journal impact factors, a balance between the impor- 
tance to the researchers of publish their work in high ranked journal, the 
difficulties associated with doing this and the increase of impact received by 
journals with high impact factor, seems to create a "rich gets richer" (the 
"Matthew Effect", see (Merton; 1968 and Egghe and Rousseau; 1999) mech- 
anism also observed in complex networks (Barabasi, 2002). More than 49 
years ago, H. Simon (1955, 1957) proposed a model which produces similar 
distributions. It is also interesting to notice that the bending of the tail of 
JIFs rank-order distribution means that after a critical zone of JIFs values 
is smooth thus discarding the possibility of the existence of multifractality. 
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Power-laws seem to be ubiquitous in physics, biology, geography, eco- 
nomics, linguistics, etcetera (see Li, W., 2003). We consider "linguistic stud- 
ies" not only those related with natural languages but also arbitrary lan- 
guages over abstract finite alphabets. When the number of possible "words" 
is large, as it is the case for natural languages, it is expected to have a 
good fitness with a one-parameter power law. However, when the number of 
words is rather small, as it is the case of programing languages, one-exponent 
power laws absolutely fails and more parameters are necessary for a suitable 
fit. New elements to this considerations have been given by LeQuan et al 
(2002). They showed that there is a serious deviation when the size of the 
sample is huge. 

We expect that the increase in computing power will show that the de- 
viation of Zipf's and Lotka's laws is a generic phenomenon. Then, a two- 
exponent Lotkaian and Zipfian infometrics and linguistics should be welcome. 
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Physics 




Figure 2: Semi-log Impact factor rank-order distribution for Physics journals. 
Solid circles represent raw data. Hollow circles are the ouput of the model 
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Mathematics 




Figure 3: Semi-log Impact factor rank-order distribution for Mathematics 
journals. Solid circles represent raw data. Hollow circles are the ouput of the 
model 
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Figure 4: Semi-log Impact factor rank-order distribution for Environmental 
sciences. Solid circles represent raw data. Hollow circles are the ouput of the 
model 
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